Learning expressive human-like head motion sequences from speech

نویسندگان

  • Carlos Busso
  • Zhigang Deng
  • Ulrich Neumann
  • Shrikanth Narayanan
چکیده

With the development of new trends in human-machine interfaces, animated feature films and video games, better avatars and virtual agents are required that more accurately mimic how humans communicate and interact. Gestures and speech are jointly used to express intended messages. The tone and energy of the speech, facial expression, rigid head motion and hand motion combine in a non-trivial manner as they unfold in natural human interaction. Given that the use of large motion capture datasets is expensive and can only be applied in planned scenarios, new automatic approaches are required to synthesize realistic animation that capture and resemble the complex relationship between these communicative channels. One useful and practical approach is the use of acoustic features to generate gestures, exploiting the link between gestures and speech. Since the shape of the lips is determined by the underlying articulation, acoustic features have been used to generate visual visemes that match the spoken sentences [4, 5, 12, 17]. Likewise, acoustic features have been used to synthesize facial expressions [11, 30], exploiting the fact that the same muscles used for articulation also affect the shape of the face [44, 46]. One important gesture that has received less attention than other aspects in facial animations is rigid head motion. Head motion is important not only to acknowledge active listening or replace verbal information (e.g. “nod”), but also for many aspect of human

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Head Pose from Speech with a Conditional Variational Autoencoder

Natural movement plays a significant role in realistic speech animation. Numerous studies have demonstrated the contribution visual cues make to the degree we, as human observers, find an animation acceptable. Rigid head motion is one visual mode that universally cooccurs with speech, and so it is a reasonable strategy to seek a transformation from the speech mode to predict the head pose. Seve...

متن کامل

Bidirectional LSTM Networks Employing Stacked Bottleneck Features for Expressive Speech-Driven Head Motion Synthesis

Previous work in speech-driven head motion synthesis is centred around Hidden Markov Model (HMM) based methods and data that does not show a large variability of expressiveness in both speech and motion. When using expressive data, these systems often fail to produce satisfactory results. Recent studies have shown that using deep neural networks (DNNs) results in a better synthesis of head moti...

متن کامل

Speech-driven head motion synthesis using neural networks

This paper presents a neural network approach for speech-driven head motion synthesis, which can automatically predict a speaker’s head movement from his/her speech. Specifically, we realize speech-to-head-motion mapping by learning a multi-layer perceptron from audio-visual broadcast news data. First, we show that a generatively pre-trained neural network significantly outperforms a randomly i...

متن کامل

Low-level Characterization of Expressive Head Motion through Frequency Domain Analysis

For the purpose of understanding how head motions contribute to the perception of emotion in an utterance, we aim to examine the perception of emotion based on Fourier transform-based static and dynamic features of head motion. Our work is to conduct intra-related objective analysis and perceptual experiments on the link between the perception of emotion and the static/dynamic features. The obj...

متن کامل

Transforming neutral visual speech into expressive visual speech

We present a method for transforming neutral visual speech sequences into realistic expressive visual speech sequences. By applying Independent Component Analysis (ICA) to visual features extracted from time aligned neutral and equivalent expressive sequences, a model that separates speech from expression can be learned. Analyzing the behavior of different speaking styles in terms of this model...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007